Sarnia
Tree Search for LLM Agent Reinforcement Learning
Ji, Yuxiang, Ma, Ziyu, Wang, Yong, Chen, Guanhua, Chu, Xiangxiang, Wu, Liaoni
Recent advances in reinforcement learning (RL) have significantly enhanced the agentic capabilities of large language models (LLMs). In long-term and multi-turn agent tasks, existing approaches driven solely by outcome rewards often suffer from the problem of sparse supervision. To address the challenge, we propose Tree-based Group Relative Policy Optimization (Tree-GRPO), a grouped agent RL method based on tree search, where each tree node represents the complete agent interaction step. By sharing common prefixes, the tree search sampling increases the number of rollouts achievable within a fixed budget of tokens or tool calls. Moreover, we find that the tree-structured trajectory naturally allows the construction of step-wise process supervised signals even using only the outcome reward. Based on this, Tree-GRPO estimates the grouped relative advantages both on intra-tree and inter-tree levels. Through theoretical analysis, we demonstrate that the objective of intra-tree level group relative policy optimization is equivalent to that of step-level direct preference learning. Experiments across 11 datasets and 3 types of QA tasks demonstrate the superiority of the proposed tree-based RL over the chain-based RL method.Figure 1: Comparison of chain-based and tree-based sampling strategies in LLM multi-turn agent RL. The tree structure brings two major advantages: (i) less rollout budget (both on tokens and tool-calls); (ii) higher performance. Reinforcement Learning (RL) has emerged as a pivotal post-training paradigm for Large Language Models (LLMs), catalyzing the development of several frontier models (DeepSeek-AI Team, 2025; Y ang et al., 2025a; OpenAI, 2024). RL-tuned LLMs trained only with outcome rewards acquire complex reasoning abilities and achieve remarkable gains in single-turn tasks, such as mathematical proof and code generation (Team et al., 2025b; Y u et al., 2025; Chu et al., 2025a; Shao et al., 2024; Xin et al., 2024). This suggests that LLMs can learn not only through static imitation, but also by actively interacting with dynamic environments. Guided by this prospect, recent works have extended this RL paradigm to more complex agent settings involving dynamic, multi-turn interactions (Feng et al., 2025b; Singh et al., 2025; Wang et al., 2025b; Qian et al., 2025; Feng et al., Work done during internship at AMAP, Alibaba Group. Right (Ours): Tree search with nodes corresponding to complete agent step.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Southern Ocean (0.04)
- North America > Canada > Ontario > Lambton County > Sarnia (0.04)
- (11 more...)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Media > Music (0.68)
- Leisure & Entertainment > Sports > Football (0.68)
WeatherArchive-Bench: Benchmarking Retrieval-Augmented Reasoning for Historical Weather Archives
Yu, Yongan, Du, Xianda, Hu, Qingchen, Liang, Jiahao, Ni, Jingwei, Qiang, Dan, Huang, Kaiyu, McKenzie, Grant, Sieber, Renee, Mo, Fengran
Historical archives on weather events are collections of enduring primary source records that offer rich, untapped narratives of how societies have experienced and responded to extreme weather events. These qualitative accounts provide insights into societal vulnerability and resilience that are largely absent from meteorological records, making them valuable for climate scientists to understand societal responses. However, their vast scale, noisy digitized quality, and archaic language make it difficult to transform them into structured knowledge for climate research. To address this challenge, we introduce WeatherArchive-Bench, the first benchmark for evaluating retrieval-augmented generation (RAG) systems on historical weather archives. WeatherArchive-Bench comprises two tasks: WeatherArchive-Retrieval, which measures a system's ability to locate historically relevant passages from over one million archival news segments, and WeatherArchive-Assessment, which evaluates whether Large Language Models (LLMs) can classify societal vulnerability and resilience indicators from extreme weather narratives. Extensive experiments across sparse, dense, and re-ranking retrievers, as well as a diverse set of LLMs, reveal that dense retrievers often fail on historical terminology, while LLMs frequently misinterpret vulnerability and resilience concepts. These findings highlight key limitations in reasoning about complex societal indicators and provide insights for designing more robust climate-focused RAG systems from archival contexts. The constructed dataset and evaluation framework are publicly available at https://anonymous.4open.science/r/WeatherArchive-Bench/.
- North America > Canada > Quebec > Montreal (0.14)
- Europe > Austria > Vienna (0.14)
- North America > Canada > Ontario > Toronto (0.05)
- (11 more...)
- Banking & Finance (0.93)
- Information Technology (0.93)
- Health & Medicine (0.67)
Deep Learning-Based Analysis of Power Consumption in Gasoline, Electric, and Hybrid Vehicles
Yahyaabadi, Roksana, Farhani, Ghazal, Rahman, Taufiq, Nikan, Soodeh, Jirjees, Abdullah, Araji, Fadi
Accurate power consumption prediction is crucial for improving efficiency and reducing environmental impact, yet traditional methods relying on specialized instruments or rigid physical models are impractical for large-scale, real-world deployment. This study introduces a scalable data-driven method using powertrain dynamic feature sets and both traditional machine learning and deep neural networks to estimate instantaneous and cumulative power consumption in internal combustion engine (ICE), electric vehicle (EV), and hybrid electric vehicle (HEV) platforms. ICE models achieved high instantaneous accuracy with mean absolute error and root mean squared error on the order of $10^{-3}$, and cumulative errors under 3%. Transformer and long short-term memory models performed best for EVs and HEVs, with cumulative errors below 4.1% and 2.1%, respectively. Results confirm the approach's effectiveness across vehicles and models. Uncertainty analysis revealed greater variability in EV and HEV datasets than ICE, due to complex power management, emphasizing the need for robust models for advanced powertrains.
- North America > Canada > Ontario > Middlesex County > London (0.14)
- North America > United States (0.14)
- North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
- (2 more...)
- Transportation > Ground > Road (1.00)
- Transportation > Electric Vehicle (1.00)
- Automobiles & Trucks (1.00)
Self-driving car demo is the first to cross the US-Canada border
As a rule, self-driving car tests tend to be limited to the country where they started. But that's not how people drive -- what happens when your autonomous vehicle crosses the border? Continental and Magna plan to find out. The machines won't be in complete control for the entire route, but they'll use a combination of cameras, lidar and radar to take over when they can, including two key border crossings (the Detroit-Windsor Tunnel and the Blue Water Bridge). This isn't the first autonomous driving-related agreement involving Michigan and Ontario, but it's an important one: it'll explore rules and regulations in addition to usual self-driving data collection.
- North America > United States > Michigan (0.34)
- North America > Canada > Ontario > Lambton County > Sarnia (0.09)
- Transportation > Ground > Road (0.99)
- Information Technology > Robotics & Automation (0.99)
- Automobiles & Trucks (0.99)
- Transportation > Passenger (0.65)